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Biological and technological networks contain patterns, termed network motifs, which occur far 
more often than in randomized networks. Network motifs were suggested to be elementary building 
blocks that carry out key functions in the network. It is of interest to understand how network 
motifs combine to form larger structures. To address this, we present a systematic approach to 
define 'motif generalizations': families of motifs of different sizes that share a common architectural 
theme. To define motif generalizations, we first define 'roles' in a subgraph according to structural 
equivalence. For example, the feedforward loop triad, a motif in transcription, neuronal and some 
electronic networks, has three roles, an input node, an output node and an internal node. The 
roles are used to define possible generalizations of the motif. The feedforward loop can have three 
simple generalizations, based on replicating each of the three roles and their connections. We present 
algorithms for efficiently detecting motif generalizations. We find that the transcription networks 
of bacteria and yeast display only one of the three generalizations, the multi-output feedforward 
generalization. In contrast, the neuronal network of C. elegans mainly displays the multi-input 
generalization. Forward-logic electronic circuits display a multi-input, multi-output hybrid. Thus, 
networks which share a common motif can have very different generalizations of that motif. Using 
mathematical modelling, we describe the information processing functions of the different motif 
generalizations in transcription, neuronal and electronic networks. 



PACS numbers: 05, 89.75 



I. INTRODUCTION 



A major current challenge is to understand the 
function of biological information-processing networks 
0, S H H H H Us H ES ini El 111. These networks, 
as well as networks from engineering, ecology, and other 
fields, were recently found to contain network motifs: 
small subgraphs that occur in the network far more often 
than in randomized networks [l^ ITsI l . Each class of net- 
works was found to have a characteristic set of network 
motifs |16|. Information processing networks, such as 
gene regulation networks neuron networks, and 

some electronic circuits, were found to share many of the 
same network motifs jlj, [ig . Recently, in the case of the 
transcription network of the bacterium E. coli, network 
motifs were shown theoretically and experimentally to 
function as elementary building blocks of the network, 
each performing specific information-processing tasks 
[TH Ha . E^ . For example, one of the most significant 
motifs shared by biological information processing 
networks is the feedforward loop (FFL). In transcription 
networks, the feedforward loop with positive regulations 
was shown to act as a 'persistence detector' circuit 
that rejects transient activation signals yet allows rapid 
response to inactivation signals ilSj, |19i . A second 
motif, the single-input module, was shown to generate 
a temporal order of gene expression, which correlates 
with the functional order of the genes in the pathway 
[21I |2^ . A third major motif, the bifan, which is the 
building block of dense arrays of overlapping regulation. 



performs hard-wired combinatorial decisions gov erned 
by the input functions of the output genes [23, |24L |25| . 

Here, we address the question of whether a given net- 
work motif appears independently in the network or 
whether instances of the motif combine to form larger 
structures If the latter occurs, what is the func- 

tion of these larger structures? Do different networks 
that share a certain network motif also share the same 
structural combinations of that motif? These questions 
require analysis of large subgraphs, a computationally 
difhcult problem "2^, zr, '2§, |29| . Recently, efficient al- 
gorithms for counting subgraphs based on sampling have 
been introduced 27j . These algorithms can at present be 
effectively used to detect motifs of up to 7-8 nodes. To 
go beyond this requires an approach to efficiently define 
and detect large structures whose architecture is based 
on a given motif. 

To address these issues, we present an approach for 
uniting related groups of motifs of different sizes into 
families termed motif generalizations. This allows gener- 
alizing from small motifs to the larger complexes in which 
they appear. We present an efficient algorithm to detect 
motif generalizations. We find that networks that share 
the same motif can have different generalizations of that 
motif. For example, we find different generalizations of 
the FFL motif in transcription, neuronal and electronic 
networks. Using mathematical models we analyze the 
information-processing functions of the FFL generaliza- 
tion that is selected in each of these networks. 
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FIG. 1: a. A directed 3-node subgraph (triad) b. This triad has two roles, c. 
each triad there are between one and three roles. 



Roles in all the 13 types of connected triads. In 



II. RESULTS 
A. Node Roles in a subgraph 

We begin by defining roles of nodes in a subgraph. A 
group of nodes in a subgraph share the same role if there 
is a permutation of these nodes, together with their cor- 
responding edges, that preserves the subgraph structure 
(See APPENDIX A for formal definitions). For example, 
in the v-shaped subgraph in Fig. la, nodes b and c can 
be permuted leaving the structure intact, whereas nodes 
a and b cannot. Thus, this subgraph has two roles, role 
1 and role 2 (Fig. lb). The FFL has three roles (Fig. 
Ic, triad 6), whereas the 3- loop (Fig Ic, triad 7) has only 
one role (because a cyclic permutation of the three nodes 
preserves its structure). The thirteen possible connected- 
directed triads have between one and three roles each 
(Fig. Ic), with a total of 30 different roles. 



B. Subgraph Topological Generalizations 

We now define subgraph topological generalizations 
based on node roles. Subgraph topological generaliza- 
tions are extensions of a subgraph to a family of larger 
subgraphs which share its basic structure. Consider the 
FFL (Fig. 2a). For this 3-node subgraph we define three 
simple generalizations to the level of 4 nodes (Fig. 2b). 



In each simple generalization a single role and its connec- 
tions are duplicated. In the first simple generalization, 
the X role and its connections are duplicated. This gen- 
eralization is termed double-X FFL or double- input FFL. 
The other two generalizations are obtained by duplicat- 
ing the Y or Z roles. This replication process can be 
continued, leading to higher-order motif generalizations, 
the multi-X (multi- input), multi-Y and multi-Z (multi- 
output) FFL generalizations (Fig. 2c). 
More complex generalizations can be obtained by repli- 
cating more than one of the roles. For example, dupli- 
cating both the X and Z roles yields five-node general- 
izations (Fig 2d). When replicating more than one role 
(and in some cases replicating even a single role), one 
can define two kinds of generalizations: in strong gen- 
eralizations, every X,Y,Z triplet forms a FFL. In weak 
generalizations, every node participates in at least one 
FFL, but not all possible FFLs are formed (Fig. 2d). 
This procedure of generalization can be applied to any 
subgraph (see formal definition in APPENDIX B). For 
example simple generalizations of the 4-node bi-fan are 
shown in Fig. 2e-g. We now describe the statistical- 
significance of the generalizations of the motifs found in 
various networks. 
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FIG. 2: a. The feedforwaxd loop triad has three roles: X (input node), Y (internal - secondary input) node and Z (output 
node) b. 4-node simple generalizations of the feedforward loop. The X node is duplicated to form the double-X generalization. 
The Y and Z nodes are duplicated to form the doublo-Y and doublc-Z generalizations respectively, c. Simple multi-node 
generalizations of the FFL. d. Strong and weak generalization rules. A 5-nodc generalization of the FFL with two X nodes, 
one Y node, and two Z nodes. In the strong generalization every combination of a X,Y,Z triplet of nodes forms a FFL. e. The 
bi-fan, a 4-node motif with two roles X (input role) and Y (output role), f. 5-node simple generalizations of the bi-fan. In 
each of the two generalizations one of the two roles is duplicated, g. Simple multi-node generalization of the bi-fan: an X or 
Y node is replicated to form the multi-input or multi-output bi-fan generalization respectively. 
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FIG. 3: Statistical significance of motif generalizations. The cumulative number of multi-Z FFLs in the real network 
(black) and randomized networks - mean ± SD (grey) in a. E. coh transcription network, b. S. cerevisiae transcription 
network, c. The cumulative number of multi-X FFLs in the real and randomized networks (mean ± SD) in the C. elegans 
neuronal network. 



Generalization 


Subgraph size 


Transcriptional Networks 


Neurons 


Electronic chips 


E. coh 


yeast 


C. elegans 


S15850 


basic bi-fan 


4 (2X,2Y) 


+ (N=209) 


+ (N=1812) 


+ (N=126) 


+ (N=1040) 


multi output 


5 (2X,3Y) 

6 (2X,4Y) 


+ (N=264) 
+ (C=0.015) 


+ (N=14857) 
+ (C=3.5) 


+ (N=152) 
+(C=0.17) 


+ (N=1990) 
+ (C=0.28) 


multi input 


5 (3X,2Y) 

6 (4X,2Y) 


+ (N=20) 
- (N=0) 


+ (N=81) 
+ (N=14) 


+ (N=25) 
+(C=0.015) 


+ (N=226) 
+ (C^O.OOl) 


equal multi input-outputs 


6 (3X,3Y) 


+ (N=6) 


+ (N=21) 


- (N=0) 


+ (N=301) 



TABLE I: Bi-fan generalizations in different networks. (aX,bY) represents the multiplicity of each of the roles in 
the generalization (Fig. 2g). '+': Statistically significant generalizations, non-significant generalizations. Number of 
appearances (N), or concentration (xlO~^) (C) (23 are listed. 



C. Network Motifs Topological Generalizations 

While enumerating all subgraphs of a given size is 
a difficult task, enumerating generalizations of a given 
subgraph can be performed efficiently by an algorithm 
described in APPENDIX C. The algorithm is based 
on using the appearances of the basic subgraph as 
nucleation points for a search for its generalizations. As 
an example, we applied this algorithm to networks in 
which the FPL and bi-fan are motifs, to ask whether 
any of the possible PPL or bi-fan generalizations occur 
significantly in the networks (APPENDIX C). In the 
transcription networks of E. coli |l5l | and 5". cerevisiae 
we find that the multi-Z PPL generalization is 
highly significant (Fig. 3a, b). The other two possible 
simple generalizations are not significant (in the E. coli 
network, multi-X's and multi-Y's do not occur at all, in 
the S. cerevisiae network both appear only twice). An 
example of a multi-Z FPL in the E. coli transcription 
network, the maltose utilization system, is shown in Pig. 
4a. In each multi-Z PPL, the different genes (Z roles) 
share a common biological function (as shown in tables 
2 and 3 that list all multi-Z PPL complexes in the E. 
coli and S. cerevisiae networks). 

In the network of synaptic connections between 



neurons in C. elegans |l4l ISCt Isij , we find a different 
FPL generalization: the multi-X FPL (Pig. 3c). This 
structure occurs 29 times in the network, with upto 
4 inputs. Multi-Y and multi-Z FFLs are found in far 
smaller numbers (double-X's and double-Y's PPL appear 
3 times each) j32| • An example of a multi-X PPL in the 
locomotion control circuit of C. elegans is shown in Fig. 
4b. 

In networks of connections between logic gates in 
forward- logic electronic chips [l3 . l33Ll3^ we find no sim- 
ple generalization of the PPL. These electronic circuits 
do, however, show a complex PPL generalization - a 
structure with two Xs, a single Y and two Zs (a weak gen- 
eralization. Pig. 4c). In the five forward- logic electronic 
chips we have analyzed, 70 percent to 100 percent of the 
FFLs are embedded in instances of this 5-node structure. 

The most prominent 4-node network motif in these 
networks is the bi-fan pj (Pig. 2e). The bi-fan has 
two roles and therefore two simple generalizations (Fig. 
2g). We find that both simple generalizations of the bi- 
fan (multi-output and multi-input) are significant in the 
transcription, neuronal and electronic networks (Table 
1). The multi-output bi-fan generalizations are more sig- 
nificant and the maximal Y multiplicity is higher than 
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FIG. 4: The FFL generalizations found in biological 
and technological networks, a. An example of a three-Z 
FFL in the transcription network of E. colt, maltose utiliza- 
tion system. The activator CRP senses glucose starvation, 
MalT senses maltotriose, and malEFG, malK and malS par- 
ticipate in maltose metabolism and transport, b. An example 
of a double-X FFL in the locomotion neural circuit of C. ele- 
gans. AVA and AVD are ventral cord command interneurons. 
AVD functions as modulator for backward locomotion. AVA 
functions as driver cell for backward locomotion. ASH and 
FLP are head sensory neurons sensitive to noxious chemicals 
and nose touch, c. A generalized form of the FFL (2X,Y,2Z) 
found in forward-logic electronic chips. This 5-node structure 
appears as a part of a 6-node module, which implements XOR 
(Exclusive OR) using 4 NAND gates, d. Truth table of the 
circuit described in c (a (2X,Y,2Z) FFL generalization with 
additional NAND gate at the output). There are 2 input bits 
XI and X2 and a single output bit which is equal to (XI XOR 
X2). 



the maximal X multiplicity in all these networks. In these 
networks we find structures of multi-output bi-fan with 
10 Ys and more, while multi-input bi-fan do not exceed 
6 input X nodes. 



D. Functions of multi-output FFL generalization in 
transcription networks 

The function of the FFL depends on the signs of 
the interactions (positive or negative regulation), on 
their strengths and on the functions that integrate 
multiple inputs into each node. In the case of positive 
regulation, the 3-node FFL has been shown to function 
as a persistence detector 0: it filters out short 
input stimuli to X, and responds only to persistent 
signals. On the other hand, it responds quickly to 
OFF steps in the input to X [H [l3. With other 
sign combinations, the 3-node FFL can function as 
a pulse-generator or response accelerator l35j . 
These functions apply to a wide range of interaction 
strengths, and to both AND and OR-like input functions. 

Here, we studied the functions of the generalizations 
of the FFL. We begin with the multi-output FFL, which 
is the generalization that is significant in transcription 
networks. The multi-output FFL has a single input 
node X, a single internal node Y (secondary input) 
and a number of output nodes Z\..Zra (Fig. 2c, 4a). 
The arrows in the FFL diagram should be assigned 
numbers representing the strength of the interaction 
of the transcription factors (TFs) X and Y with the 
promoters of the various Z-genes |2l| . These numbers 
correspond to the activation or repression coefficients 
of each gene (the concentration of the TF required for 
50 percent effect ,5^ Jl^ _36j)- Here, we consider for 
simplicity the most common case, that of FFLs with 
positive regulation jlSj . We employ a simple model of 
the dynamics of this circuit 0|. X{t) is the activity 
of the transcription factor X, Y(t) of F, Zj(t) is the 
concentration of the gene product Zj . The dynamics of 
transcription factor Y and the output gene products Zj 
is given by 

dY/dt = F{X,Ty^,) - aV 

dZj/dt = F{X,T,^.,)F{Y,T,^y) - aZ, 

Where a is the protein hfetime [s^, HI] and Ty^, 
Tzix, Tz2x, Tz^y, Tz^y are the activation thresholds of 
the various genes (Fig. 5a). For simplicity we use 
a sharp activation function, F{U,T) — 1 if U > T 
and otherwise. The qualitative results apply also to 
Michaelis-type activation functions. These equations 
can be solved analytically, yielding piecewise exponential 
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dynamics in response to step like activation profiles of 
X. We find that the multi-output FFL can encode a 
temporal order of expression of the Z genes, by means 
of different activation thresholds T^ y for each of the 
output genes (Fig. 5a, b). This temporal ordering fea- 
ture is shared with another common network motif, the 
single-input module [l5ll2ll l2^. Indeed, high resolution 
expression measurements on the flagella multi-output 
FFL (in E. coli) showed that the class 2 flagella genes, 
which are regulated by a feedforward loop, are activated 
in a temporal order that corresponds to the functional 
order of the gene product in the assembly of the flagellar 
motor pll40l|. 

The timing of activation of gene j following a step 
activation of X is 
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FIG. 5: Kinetics of a double-output FFL generalizations fol- 
lowing pulses of stimuli, a. A double-output FFL with pos- 
itive regulation and AND-logic input function for Ziand Z2. 
Numbers on the arrows are activation thresholds, b. Simu- 
lated kinetics of the double-output FFL in response to a short 
pulse and a long pulse of X activity. The dashed and dotted 
horizontal lines represent the activation thresholds Tz^y and 
Tz2y a = 1 was used. 



Tj = -a ^ ln(l - T^.y/Ymax) 

The rise time of the different genes can be tuned by 
Tz y/Ymax, where Ymax is the maximal concentration of 
Y. Note that Tz^y can be easily tuned during evolution 
, for example by mutations in the binding site of Y 
in the Zj promotor psl |40| |. The Z gene with the 
lowest activation threshold is turned on first after the 
stimulation of X. Furthermore, the multi-Z FFL can 
act as a persistence detector for all of the output genes 
(Fig. 5b): the Z genes are expressed only if the input 
stimulus to X is present for a long enough time. The 
minimal time that a saturating X stimulus needs to 
be present to activate gene j is equal to Tj. Thus this 
FFL generalization preserves the functionality of the 
original FFL motif. The turn-off order of the Z genes 
upon a gradual decay of X activity can be separately 
controlled by the activation coefficients of the X TF, 
Tz x [401] . Thus different turn on and turn off orders of 
the Zj genes can in principle be achieved. In summary, 
the multi-output FFL preserves the functionality of 
the simple FFL, and in addition can encode temporal 
expression programs among the different Z genes. 



E. Functions of multi-input FFL generalization in 
neuronal networks 

A different FFL generalization, multi-input FFL, is 
found in the neuronal network of C. elegans. In general, 
the function of this circuit depends on the signs on the 
arrows and on two input-functions (gates): one input 
function integrates the multiple X inputs to Y, and the 
other integrates the inputs from Y and Xi..Xm to Z. 
(Fig. 6a) 

We analyzed the dynamics of one possible two-input 
FFL, where the input-function governing the Y node 
is an OR gate, Xi OR X2, and the input- function 
of the Z node is Y AND {Xi OR X2) (Fig. 6a,b,c). 
This choice of input-functions ensure that both Y and 
either Xi or X2 are needed for Z to be activated to 
a level that allows activation of its downstream (post 
synaptic) neurons or muscle cells (as is the case, for 
example, in the circuit of Fig. 4b, in which ablation 
of the neuron AVD results in loss of sensory input to 
the neuron AVA 01). These input functions could 
in principle be implemented by simple neurons which 
integrate weighted inputs. The input function of Z, 
for example, represents strong synapses from Y and 
weaker ones from Xi and X2. It is important to note 
that the simplest equations that describe transcription 
networks also describe neurons with graded potential 
and no spiking (as C. elegans neurons are thought to be 
miill ). In the case of neurons, X,{t), Y{t) and Z{t) 
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represent neuron membrane potentials. The activation 
dynamics of the circuit in Fig. 6a are 



dY/dt = + X2, Ty^) - aY 

dZ/dt = F{Y, T,y)[F{X, + X2, r,,)] - aZ 

Here a is the relaxation rate of the neurons' membrane 
potential, and the synaptic activation thresholds are 
T T T 

This model shows that the circuit can act as a 
persistence detector for both Xi and X2 (Fig. 6b). In 
the locomotion neuronal circuit example (Fig 4b), the 
FFL circuit could elicit backward motion only if the 
stimulation of one of the sensory neurons is longer than 
a threshold duration r determined by the parameters of 
the circuit. 



A transient stimulation would not be enough to elicit 
backward motion. Furthermore, we find that sufficiently 
closely spaced short pulses of Xi and X2 can elicit a 
response, even ij each pulse alone can not (Fig. 6c). This 
highlights a 'memory-like' function of Y, which can store 
information from recent stimulations over its relaxation 
time. In the basic 3-node FFL, Y can store information 
about recurring pulses of X. In the multi-input FFL, 
Y can store information from multiple inputs (Fig. 6c 
gives an example), and increase sensitivity to one input 
if the other input has recently been detected. Generally, 
if the summed input of the input-nodes Xj to node Y is 
S{t) = F{xi + X2,Tyx), Z is activated when Y activity 
exceeds the threshold T,„ 



Y{t) = e-"* /o S{t')e'^'' dt' > T,y 



(where Y{t = 0) = 0), showing that node Y effectively 
integrates the inputs over a time scale oil /a. 



012345678 

time 



FIG. 6: Kinetics of a double-input FFL generalization follow- 
ing pulses of stimuli, a. A double-input FFL. Input functions 

for Y and Z, and the activation thresholds, arc shown as gates 
and numbers on the arrows, b. Simulated kinetics of the two- 
input FFL, with short well-separated stimuli pulses of Xi and 
X2, followed by a persistent X\ stimulus, c. Simulated kinet- 
ics of the double-input FFL, with short X\ stimulus followed 
rapidly by a short X2 stimulus pulse. The dashed horizon- 
tal line corresponds to the activation thresholds for Y , T^y. 
a = 1 was used. 



F. Function of FFL generalization in electronic 
chips 

Forward-logic electronic chips are networks in which 
nodes represent logic gates. These circuits are optimized 
to perform a hard-wired logical function between input 
and output nodes. Forward-logic chips, taken from 
an engineering database (ISCAS89), were previously 
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found to display the FFL network motif 0|. Here we 
find that they display a specific generalization of the 
FFL, with two input and two output nodes (Fig. 4c). 
Analyzing the appearances of this pattern, we find that 
this 5-node generalized FFL motif is part of a commonly 
used engineering module built of 4 NAND gates, which 
implements XOR (exclusive OR) logic on the two inputs 
Hi] (see truth table in Fig. 4d). 



III. DISCUSSION 

This study presented a systematic approach for 
defining and detecting generalizations of network motifs. 
Motif generalizations are families of subgraphs of differ- 
ent sizes which share a common structural theme, and 
which appear significantly more often in the network 
than in randomized networks. The generalizations are 
produced by replicating nodes in a basic motif structure. 
The generalizations often preserve the functionality of 
the network motif on which they are based, because they 
preserve the roles of nodes in the motif (for example, 
by replicating input or output nodes). We presented an 
efficient algorithm for detecting motif generalizations. 
We find that different networks which display the same 
motifs can show very different generalizations of these 
motifs. We also demonstrated using simple models 
that these generalized motifs can carry out specific 
information processing functions. These functions can in 
principle be tested experimentally in transcription and 
neuronal systems. 

The two sensory transcription networks, from a 
prokaryote {E. coli) and a eukaryote (5. cerevisiae), 
showed the same generalization of the FFL: both 
networks display the multi-output FFL generalization 
[TH |2^. The other two generalizations, multi- input 
and multi-Y, are not found significantly in these tran- 
scription networks. Multi-output FFL complexes are 
found throughout the transcription networks in diverse 
systems (tables 2,3). The X role is usually a global 
transcription factor which controls many genes, the Y 
role is usually a 'local' transcription factor which controls 
specific systems, and the Z nodes are the regulated genes 
which share a specific function. Often, multi-output 
FFLs in E. coli that respond to specific stimuli have 
a non-homologous multi-output FFL counterpart in 
yeast which responds to similar stimuli. The fact that 
the genes in these circuits are not evolutionary related, 
whereas their connectivity patterns are the same in the 
two organisms, suggests convergent evolution to the 
same regulation pattern 14, A5\ . Examples include 
systems that respond to carbon limitation, drugs, and 
nitrogen starvation in both organisms (tables 2,3). 
Multi-output FFLs can also appear in systems that 
make up a protein machine, for example, a multi-output 
FFL in E. coli controls genes whose products make up 



the flagellar basal-body motor ^ (X=flhDC, Y=fliA, 
Z~ class 2 flagella genes). We find that the multi-output 
FFL can serve as a persistence detector for all the 
outputs. In addition it can generate temporal orders of 
output gene expression [ioj . 

A different FFL generalization, the multi-input FFL, 
is found in the neuronal synaptic wiring of C. elegans. 
This network is found to chiefly display the multi-input 
FFL (Fig. 2c), and not the other two generalizations. 
The multi-input FFL has a number of input nodes 
Xi..Xm, a single internal node Y (secondary input) 
and a single output node Z. As an example we have 
mentioned the backward locomotion control circuit of 
the worm. This circuit is governed by two ventral-cord 
command interneurons AVD and AVA |4ll li^ . These 
two neurons are linked in a multi-input FFL with several 
input neurons, such as ASH and FLP (Fig. 4b), which 
are head sensory neurons sensitive to nose touch and 
noxious chemicals 113 • This circuit implements an 
avoidance reflex, eliciting backward motion in response 
to head stimulation. We find that the multi-input FFL 
can serve as a persistence detector for each input. In 
addition, it can serve as coincidence detector for weak 
inputs, firing only if short stimuli from two or more 
different inputs occur within a certain time of each other. 

A different FFL generalization, with two inputs and 
two outputs, appears in a class of electronic circuits. 
This motif generalization functions within a XOR 
gate. This demonstrates that network motifs and their 
generalizations can be used to detect basic functional 
building block of a network without prior knowledge. 

Motif generalizations cover a substantial portion of 
the high-order motifs in various biological and tech- 
nological networks we have studied. However, motifs 
generalizations in the present form do not cover all 
possible types of families of structures that share similar 
architectural themes. It would be important to find 
additional rules for defining families of motifs beyond the 
current notion of motif generalization by role replication. 
Motifs and their generalizations can help us understand 
the design principles of complex networks by defining 
functional building blocks whose function can be tested 
experimentally. 

To summarize, this study presented topological gener- 
alizations of network motifs, and an efficient algorithm to 
detect them. We found motif generalizations in several 
real- world networks. Networks that share the same motif 
were found to exhibit different generalizations of that mo- 
tif. We demonstrated theoretically that the generalized 
motifs in biological networks can carry out information- 
processing functions. 
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APPENDIX A: ROLES IN A SUBGRAPH - 
FORMAL DEFINITION 

We classify nodes in a subgraph into structurally equiv- 
alent classes. Each class represents a role. The measure 
of structural equivalence that we use here is automor- 
phic equivalence 46, 47, 48, 49, 50]. Let S = (Vs.Es) 
be a subgraph, then an automorphism is a one-to-one 
mapping, r ,from Vg to Vs , such that {vi,Vj) £ Eg if 
and only if {T{vi), T{vj)) G Eg. Two nodes Vi and Vj are 
automorphically equivalent if and only if there is some 
automorphism, t , that maps one of the nodes to the 
other {T{vi) = vj). For each subgraph S, we classify all 
its n nodes into roles by examining structural equivalence 
of all possible pairs of the nodes. By the transitivity of 
automorphic equivalence, we are guaranteed to get a par- 
tition of the nodes into distinct roles. This concept can 
be readily generalized for networks with weights on the 
edges or with different types of nodes. 



APPENDIX B: SUBGRAPH GENERALIZATION - 
FORMAL DEFINITION 

Let S be the basic subgraph where ri..rL are the set of 
roles of S with multiplicity (di, ..,6?^) respectively, sim- 
ple generalization of S is a subgraph which is formed by 
replication of a single role and its edges to preserve the 
roles connectivity of S. Note that in a simple generaliza- 
tion only a single role is replicated. A generalized form 
of a subgraph is defined by a pair(M, V^) where M is 
an L X L image matrix, which describes the connectivity 
between roles. M[i,j] = 1 if there is an edge between 
role i and j {i is not equal to j), and M[i,j] = other- 
wise. M[i,i] = if there is no edge between every two 
nodes of role i, M[z,z] = 1 if there is a single edge, and 
M[i,i] = 2 if there is a mutual edge. V'" € iV^ is an L- 
dimensional vector which defines the multiplicity of each 
role. The FFL which is an example of a basic subgraph, 
is represented by {MppL, (1, 1, 1)) where 



M 



FFL 




and the vector (1,1,1) describes the roles multiplicity: 
in the basic FFL each of the three roles X,Y,Z appears 
once. A FFL with two output nodes is represented by 



the pair {MppL, (1,1,2)). A FFL with m output nodes 
(m Z-role nodes) is represented by {Mffl, (1, 1, m)) (Fig 
2c) . Such a generalization has only one degree of freedom 
- the multiplicity of the Z role in the structure. There are 
cases, such as multiplicity of more than one role, where 
we need additional definition in order to distinguish be- 
tween different types of structures. For this we define 
the generalization rule r. We define two possible gener- 
alization rules: a strong generalization rule and a weak 
generalization rule. An example of a strong and weak 
(Mj?i?2., (2, 1, 2)) generalization is illustrated in Fig 2d. 
If S is the basic n-node subgraph with set of L roles rep- 
resented by the multiplicity vector (di, .., c?i,) then a basic 
n-node set is every set of n nodes in the structure that 
consists of di nodes of role i (for all 1 < i < L). For ex- 
ample every set of three nodes in the multi output FFL, 
consisting of the X node, Y node and one of the Z role 
nodes, is a basic n-node set. A strong generalization rule, 
r^, requires that every basic n-node set in the structure 
forms the basic subgraph S. A weak generalization rule, 
, requires that every node in the structure participates 
in at least one basic n-node set (Fig. 2d). Note that 
weak generalization can represent more than one unique 
structure of a given size. 



APPENDIX C: ALGORITHM FOR MOTIF 
GENERALIZATIONS DETECTION 

We begin by finding the network motifs (significant 
subgraphs) of size n (usually n=3-4) in the network as 
described in (3, flSj (^application and source code are 
available at http://www.weizmann.ac.il/mcb/UriAlon/). 
For each motif, for each of its roles, we prepare a list of 
all the nodes that play that role. We perform a search 
for all of the generalizations of each motif using its ap- 
pearances in the network as starting point. This search 
reduces computation time and enables the detection 
of significant generalization forms of the basic motifs, 
which are beyond reach using algorithms that attempt 
to enumerate all subgraphs of a given size. 

In order to compute the statistical significance of a 
certain generalization of a motif 5*, we first find for 
each appearance of S in the network the maximal size 
generalization in which it appears. Then we count the 
cumulative number of times S appears in the union 
of all the maximal generalizations (up to size k). In 
order to verify that the generalization significance is 
not due to many stand-alone appearances of the basic 
subgraph (e.g. a single-Z FFL in the case of multi-Z 
FFL generalization), we subtract the number of time S 
appears as a stand-alone structure in the network from 
the cumulative results (Note that in Fig 3 we show the 
results before subtractions). We compare these numbers 
to the corresponding numbers in randomized networks 
(Here we used Z score > 2). It is important to note 
that the randomized networks preserve the incoming. 
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Complex 


Id. 


X 


Y 


Z 


Function 


size 














1 


arcA 


appY 


appCBA 


Anaerobic/stationary phase 




2 


crp 


fucPIKUR 


fucAO 


Fucosc utilization 




3 


crp 


fur 


cirA 


Iron citrate uptake 




4 


crp 


galS 


mglBAC 


Carbon utilization 


1 


5 


crp 


mall 


malXY 


Maltose utilization 




6 


crp 


melR 


mclAB 


Melibiose utilization 




7 


hns 


flhDC 


fliAZY 


Flagella regulation 




8 


met J 


metR 


mctA 


Methionine biosynthesis 




9 


ompR-cnvZ 


csgDEFG 


csgBA 


Osmotic stress response 




10 


crp 


caiF 


caiTABCDE 


Carnitine metabolism 










flxABCX 






11 


crp 


nagBACD 


manXYZ 


Carbon utilization 










nagE 




2 


12 


himA 


ompR-cnvZ 


ompC 


Osmotic stress response 










ompF 






13 


rpoN 


fhlA 


fdhF 


Formate hydrogen lyase system 










hycABCDEFGH 






14 


rpoN 


gliiALG 


gliiHPQ 


Nitrogen utilization 










nac 




3 


15 


crp 


malT 


malEFG 


Maltose utilization 










malK-lamB-malM 












malS 






16 


crp 


araC 


araBAD 


Arabinose utilization 










araE 












araFGH 












araj 




4 


17 


rob 


marRAB 


fumC 


Drug resistance 










nfo 












sodA 












zwf 




5 


18 


flhDC 


fliAZY 


flgBCDEFGHIJK 


Flagella system 










flhBAE 












fliE 












fliFGHIJK 












fliLMNOPQR 




7 


19 


fur 


arcA 


cydAB 


Anaerobic metabolism 










cyoABCDE 












focA-pflB 












glpACB 












icdA 












nuoABCDEFGHIJKLMN 












sdhCDAB-b0725-sucABCD 





TABLE II: Feedforward loops in E. coli transcription network classified into multi-Z complexes. Complex size is 
the number of operons (Z-role nodes) in the FFL generalization 



outgoing and mutual edge degree for each node. The 
networks are not constrained to have the same number 
of 3-node or higher subgraphs as in the real network (in 
pjj in contrast, 4-node motifs were detected based on 
randomized networks that preserved 3-node subgraph 
counts). 

The network is described by a directed interaction 
graph G = (V, E) , where V is the set of nodes and E 
is the set of edges. An edge ivi,Vj) G E represents a 
directed link between nodes Vi and Vj . For every n-node 
subgraph S which is detected as a network motif [3 ^| 
we search for its simple generalizations (multiplicity 



of one of the roles). We begin by building an induced 
graph G' — {V',E'). The nodes in G' are only those 
that act as members (nodes) of S appearances in G, 
and the edges are only the edges in G between these 
nodes. G' is usually a much smaller graph then G, but 
it contains all the information we need for our purpose. 
For each simple generalization type j (multiplicity of the 
j-th role of the subgraph) the following is performed: A 
non-directed graph G — {V, E) is built where each node 
represents a specific basic subgraph S* in G (a specific 
set of nodes in G that form a subgraph of type S). The 
number of nodes in G equals the number of times S 
appears in the original graph G. Two nodes in G are 
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Complex 


Id. 


X 


Y 


Z 


Function 


size 














1 


TUPl 


RMEl 


IMEl 


Meiosis 




2 


RIMlOl 


IMEl 


DITl 


Sporulation 




3 


MIGl 


HAP2-3-4-5 


CYCl 


Formation of apocytochromes 




4 


MIGl 


GAL4 


GALl 


Galactokinase 


1 


5 


MIGl 


CATS 


JENl 


Lactate uptake 




6 


MIG2 


CATS 


JENl 


(2X-FFL complex) 




7 


GATl 


DAL80-GZF3 


GAPl 


Nitrogen utilization 




8 


TUPl 


ALPHAl 


MFALPHAl 


Mating factor alpha 




9 


GALll 


ALPHAl 


MFALPHAl 


(2X-FFL complex) 




10 


TUPl 


ROXl 


ANBl 


Anaerobic metabolism 










CYC7 






11 


GLN3 


GATl 


GAPl 


Nitrogen utilization 










GLNl 


Glutamate synthetase 


2 


12 


GLN3 


GATl 


DAL80 


Nitrogen utilization 










GLNl 


Glutamate synthetase 




13 


GLN3 


DAL80-GZF3 


GAPl 


Nitrogen utilization 










UGA4 






14 


PDRl 


YRRl 


SNQ2 


Drug resistance 










YORl 






15 


GCN4 


MET4 


MET16 


Methionine biosynthesis 










MET17 






16 


HAPl 


ROXl 


ERG 11 


Anaerobic metabolism 










HEM13 




3 








CYC 7 






17 


SPT16 


SWI4-SWI6 


CLNl 


Cell cycle and 










CLN2 


mating type switch 










HO 






18 


GCN4 


LEU3 


ILVl 


Leucine and branched amino 










ILV2 


acid biosynthesis 










ILV5 




4 








LEU4 






19 


UME6 


IN02-IN04 


CHOI 


Phospholipid biosynthesis 










CH02 












INOl 












OPI3 




6 


20 


PDRl 


PDR3 


HXTll 


Drug resistance 










HXT9 












IPTl 












PDR5 












SNQ2 












YORl 




15 


21 


GLN3 


DAL80 


CANl 


Nitrogen utilization 










DALl 












DAL2 












DAL3 












DAL4 












DAL5 












DAL7 












DCGl 












DURl 












DUR3 












GDHl 












PUTl 












PUT2 












PUT4 












UGAl 





TABLE III: Feedforward loops in S. cerevisiae transcription network classified into multi-Z complexes. Complex 

size is the number of genes (Z-role nodes) in the FFL generalization. 
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connected if and only if they follow the generalization 
type, j, and the generalization rule (strong or weak). 
Setting the edges in G is done efficiently by using the 
appearances of the basic subgraph in G' as starting 
points. For each specific 'starting point' subgraph Si 
in G' we pass through all the 'neighboring' subgraphs 
^2 ('neighboring' in the sense that they share all node 
roles excluding j-th node roles) and check if the joint 
subgraph (5*1 IJ S2) in G' forms a generalization type 
j. After setting all edges in G, the next step is to find 
all maximal cliques j5l| (a group of nodes in which 
every two are connected) in G. Each maximal clique 
represents a maximal generalization type j of S* (i.e. 
the generalization with maximal number of appearances 
of the basic subgraph). We store the size and the 
members (nodes in the original network) of all maximal 
generalizations. Complex generalizations (when more 
than one role is replicated) were detected in a similar 
way by appropriately changing the rules for setting the 
edges in G. 



http : / / www, weizmann . ac . il /mcb /Uri Alon /" was 
based on selected data from 0, |^ and liter- 
ature. Transcription network of yeast [S. cere- 
visiae) [11, version 1.3 (N=685, E=1052) available 
at http://www.weizmann.ac.il/nicb/Ur iAlon/ was 
based on selected data from [Tj, (N=number 
of nodes, E=number of edges). Self edges were ex- 
cluded. The Neuronal synaptic connection network 
of C. elegans (N=280, E=400) was based on H^l as 
arranged in |8l| . The network was compiled with a 
cutoff of at least 5 synapses for connections between 
neurons. Target muscle cells were excluded. Electronic 
forward-logic chips |l4j were obtained by parsing 
the ISCAS89 benchmark data set available at 

www.cbl.ncsu.edu/CBL_Docs/iscas89.html . Bi-fan 
generalizations data (Table 1) are shown for chip SI 5850 
(N=10383, E=14240), and are representative of all logic 
chips in the database. 



APPENDIX D: NETWORK DATABASES 

Transcription network of E.coli |l5j |. 
version 1.1 (N=423, E=519) available at 
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